6 research outputs found
Learning from Guided Play: Improving Exploration for Adversarial Imitation Learning with Simple Auxiliary Tasks
Adversarial imitation learning (AIL) has become a popular alternative to
supervised imitation learning that reduces the distribution shift suffered by
the latter. However, AIL requires effective exploration during an online
reinforcement learning phase. In this work, we show that the standard, naive
approach to exploration can manifest as a suboptimal local maximum if a policy
learned with AIL sufficiently matches the expert distribution without fully
learning the desired task. This can be particularly catastrophic for
manipulation tasks, where the difference between an expert and a non-expert
state-action pair is often subtle. We present Learning from Guided Play (LfGP),
a framework in which we leverage expert demonstrations of multiple exploratory,
auxiliary tasks in addition to a main task. The addition of these auxiliary
tasks forces the agent to explore states and actions that standard AIL may
learn to ignore. Additionally, this particular formulation allows for the
reusability of expert data between main tasks. Our experimental results in a
challenging multitask robotic manipulation domain indicate that LfGP
significantly outperforms both AIL and behaviour cloning, while also being more
expert sample efficient than these baselines. To explain this performance gap,
we provide further analysis of a toy problem that highlights the coupling
between a local maximum and poor exploration, and also visualize the
differences between the learned models from AIL and LfGP.Comment: In IEEE Robotics and Automation Letters (RA-L) and presented at the
IEEE/RSJ International Conference on Intelligent Robots and Systems
(IROS'23), Detroit, MI, USA, Oct. 1-5, 2023. arXiv admin note: substantial
text overlap with arXiv:2112.0893
Learning Sequential Latent Variable Models from Multimodal Time Series Data
Sequential modelling of high-dimensional data is an important problem that
appears in many domains including model-based reinforcement learning and
dynamics identification for control. Latent variable models applied to
sequential data (i.e., latent dynamics models) have been shown to be a
particularly effective probabilistic approach to solve this problem, especially
when dealing with images. However, in many application areas (e.g., robotics),
information from multiple sensing modalities is available -- existing latent
dynamics methods have not yet been extended to effectively make use of such
multimodal sequential data. Multimodal sensor streams can be correlated in a
useful manner and often contain complementary information across modalities. In
this work, we present a self-supervised generative modelling framework to
jointly learn a probabilistic latent state representation of multimodal data
and the respective dynamics. Using synthetic and real-world datasets from a
multimodal robotic planar pushing task, we demonstrate that our approach leads
to significant improvements in prediction and representation quality.
Furthermore, we compare to the common learning baseline of concatenating each
modality in the latent space and show that our principled probabilistic
formulation performs better. Finally, despite being fully self-supervised, we
demonstrate that our method is nearly as effective as an existing supervised
approach that relies on ground truth labels.Comment: In: Petrovic, I., Menegatti, E., Markovi\'c, I. (eds) Intelligent
Autonomous Systems 17. IAS 2022. Lecture Notes in Networks and Systems, vol
577. Springer, Cha
Fast Manipulability Maximization Using Continuous-Time Trajectory Optimization
A significant challenge in manipulation motion planning is to ensure agility
in the face of unpredictable changes during task execution. This requires the
identification and possible modification of suitable joint-space trajectories,
since the joint velocities required to achieve a specific endeffector motion
vary with manipulator configuration. For a given manipulator configuration, the
joint space-to-task space velocity mapping is characterized by a quantity known
as the manipulability index. In contrast to previous control-based approaches,
we examine the maximization of manipulability during planning as a way of
achieving adaptable and safe joint space-to-task space motion mappings in
various scenarios. By representing the manipulator trajectory as a
continuous-time Gaussian process (GP), we are able to leverage recent advances
in trajectory optimization to maximize the manipulability index during
trajectory generation. Moreover, the sparsity of our chosen representation
reduces the typically large computational cost associated with maximizing
manipulability when additional constraints exist. Results from simulation
studies and experiments with a real manipulator demonstrate increases in
manipulability, while maintaining smooth trajectories with more dexterous (and
therefore more agile) arm configurations.Comment: In Proceedings of the IEEE International Conference on Intelligent
Robots and Systems (IROS'19), Macau, China, Nov. 4-8, 201
Self-Calibration of Mobile Manipulator Kinematic and Sensor Extrinsic Parameters Through Contact-Based Interaction
We present a novel approach for mobile manipulator self-calibration using
contact information. Our method, based on point cloud registration, is applied
to estimate the extrinsic transform between a fixed vision sensor mounted on a
mobile base and an end effector. Beyond sensor calibration, we demonstrate that
the method can be extended to include manipulator kinematic model parameters,
which involves a non-rigid registration process. Our procedure uses on-board
sensing exclusively and does not rely on any external measurement devices,
fiducial markers, or calibration rigs. Further, it is fully automatic in the
general case. We experimentally validate the proposed method on a custom mobile
manipulator platform, and demonstrate centimetre-level post-calibration
accuracy in positioning of the end effector using visual guidance only. We also
discuss the stability properties of the registration algorithm, in order to
determine the conditions under which calibration is possible.Comment: In Proceedings of the IEEE International Conference on Robotics and
Automation (ICRA'18), Brisbane, Australia, May 21-25, 201
Learning from Guided Play: A Scheduled Hierarchical Approach for Improving Exploration in Adversarial Imitation Learning
Effective exploration continues to be a significant challenge that prevents
the deployment of reinforcement learning for many physical systems. This is
particularly true for systems with continuous and high-dimensional state and
action spaces, such as robotic manipulators. The challenge is accentuated in
the sparse rewards setting, where the low-level state information required for
the design of dense rewards is unavailable. Adversarial imitation learning
(AIL) can partially overcome this barrier by leveraging expert-generated
demonstrations of optimal behaviour and providing, essentially, a replacement
for dense reward information. Unfortunately, the availability of expert
demonstrations does not necessarily improve an agent's capability to explore
effectively and, as we empirically show, can lead to inefficient or stagnated
learning. We present Learning from Guided Play (LfGP), a framework in which we
leverage expert demonstrations of, in addition to a main task, multiple
auxiliary tasks. Subsequently, a hierarchical model is used to learn each task
reward and policy through a modified AIL procedure, in which exploration of all
tasks is enforced via a scheduler composing different tasks together. This
affords many benefits: learning efficiency is improved for main tasks with
challenging bottleneck transitions, expert data becomes reusable between tasks,
and transfer learning through the reuse of learned auxiliary task models
becomes possible. Our experimental results in a challenging multitask robotic
manipulation domain indicate that our method compares favourably to supervised
imitation learning and to a state-of-the-art AIL method. Code is available at
https://github.com/utiasSTARS/lfgp.Comment: In Proceedings of the Neural Information Processing Systems
(NeurIPS'21) Deep Reinforcement Learning Workshop, Sydney, Australia, Dec.
13, 202